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CONFIDENCE SETS FOR NONPARAMETRIC WAVELET 

REGRESSION 

By Christopher R. Genovese 1 and Larry Wasserman 2 

Carnegie Mellon University 

We construct nonparametric confidence sets for regression func- 
tions using wavelets that are uniform over Besov balls. We consider 
both thresholding and modulation estimators for the wavelet coeffi- 
cients. The confidence set is obtained by showing that a pivot process, 
constructed from the loss function, converges uniformly to a mean 
zero Gaussian process. Inverting this pivot yields a confidence set for 
the wavelet coefficients, and from this we obtain confidence sets on 
functionals of the regression curve. 

1. Introduction. Wavelet regression is an effective method for estimating 
inhomogeneous functions. Donoho and Johnstone (1995a, b, 1998) showed 
that wavelet regression estimators based on nonlinear thresholding rules 
converge at the optimal rate simultaneously across a range of Besov and 
Triebel spaces. The practical implication is that, for denoising an inhomo- 
geneous signal, wavelet thresholding outperforms linear techniques. See, for 
instance, Cai (1999), Cai and Brown (1998), Efromovich (1999), Johnstone 
and Silverman (2002) and Ogden (1997). However, confidence sets for the 
wavelet estimators may not inherit the convergence rate of function estima- 
tors. Indeed, Li (1989) shows that uniform nonparametric confidence sets 
for regression estimators decrease in radius at a n -1 / 4 rate. However, with 
additional assumptions, Picard and Tribouley (2000) show that it is possible 
to get a faster rate for pointwise intervals. 

In this paper we show how to construct uniform confidence sets for wavelet 
regression. More precisely, we construct a confidence sphere in the £ 2 -norm 
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for the wavelet coefficients of a regression function /. We use the strategy 
of Beran and Diimbgen (1998), originating from an idea in Stein (1981), 
in which one constructs a confidence set by using the loss function as an 
asymptotic pivot. Specifically, let fj,±, /j.2, ■ ■ ■ be the coefficients for / in the 
orthonormal wavelet basis (j>i, 4>i, . . . , and let (fi±,fi2, ■ ■ ■) be corresponding 
estimates that depend on a (possibly vector- valued) tuning parameter A. Let 
L n (X) = ^iifriW ~ Hi) 2 be the loss function and let S n (X) be an unbiased 
estimate of L n (X). The Beran-Diimbgen strategy has the following steps: 

1. Show that the pivot process B n (X) = y/n(L n (X) — S n (X)) converges weakly 
to a Gaussian process with covariance kernel K(s,t). 

2. Show that B n {X n ) also has a Gaussian limit, where X n minimizes S n (X). 
This step follows from the previous step if A n is independent of the pivot 
process or if B n (X n ) is stochastically very close to B n (X n ) for an appro- 
priate deterministic sequence X n . 

3. Find a consistent estimator of K(X n ,X n ). 

4. Conclude that 

„ _ J _ L n (X n ) — S n (X n ) 

f n 



ll\ V(AW - Hi) 2 < J1 7 ^- + S n (X n 



is an asymptotic 1 — a confidence set for the coefficients, where z a denotes 
the upper-tail a-quantile of a standard Normal and where fi n £ = fbe(X n ). 
5. It follows that 




is an asymptotic 1 — a confidence set for f n = J2i=i Vi4>t- 
6. With appropriate function-space assumptions, conclude that dilating A n 
yields a confidence set for /. 

The limit laws — and, thus, the confidence sets — we obtain are uniform 
over Besov balls. The exact form of the limit law depends on how the //j's are 
estimated. We consider universal shrinkage [Donoho and Johnstone (1995a)], 
modulation estimators [Beran and Diimbgen (1998)] and a restricted form 
of SureShrink [Donoho and Johnstone (1995b)]. 

Having obtained the confidence set An, we immediately get confidence 
sets for any functional T(f). Specifically, (inf f e c„ T(f), supj gCn T(f)) is an 
asymptotic confidence set for T(f). In fact, if T is a set of functionals, then 
the collection {(inf/ G c n T(/),supj gCn T(/)) :TeT} provides simultaneous 
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intervals for all the functionals in 7". If the functionals in T are point- 
evaluators, we obtain a confidence band for /; see Section 8 for a discus- 
sion of confidence bands. An alternative method for constructing confidence 
bands is given in Picard and Tribouley (2000). At the cost of additional 
assumptions, the confidence set A n can be expanded to a confidence set for 
/• 

In Section 2 we discuss the basic framework of wavelet regression. In 
Section 3 we give the formulas for the confidence sets with known variance. 
In Section 4 we extend the results to the unknown variance case. In Section 5 
we describe how to obtain confidence sets for functionals. In Section 6 we 
consider numerical examples. Finally, Section 7 contains technical results 
and Section 8 contains closing remarks. 

2. Wavelet regression. Let </> and ip be, respectively, a father and mother 
wavelet that generate the following complete orthonormal set in L 2 [0, 1]: 

0j o>fc (x)=2 J °/ 2 0(2 Jo ;r-fc), 

^ hk (x) = y/ 2 ilj(2ix-k), 

for integers j > Jo and k, where Jo is fixed. Any function / £ L 2 [0, 1] may 
be expanded as 

2 J 0-1 oo 2 J '-1 

(!) f( x ) = a k (f>j (hk (x) + Y Y Ph^i,h( x )i 

k=0 j=J k=0 

where a k = J f(j)j , k and [3j jk = J ftpj tk . For fixed j, we call (3j. = {f3 jtk : k = 
0, . . . , 2 J — 1} the resolution-j coefficients. 
Assume that 

Yi = f(xi) + aei, i = l,...,n, 

where / S J 2 [0, 1], Xj = i/n and £j are iid standard Normals. (See Section 7 
for details.) The goal is to estimate / under squared error loss. We assume 
that n = 2 1 for some integer J\ . Let 

2 J Q-1 Ji 2^-1 

( 2 ) fn( x ) = Y a k4>J ,k{ X ) + Pj,k1pj,k(x) 

k=0 j=J k=0 

denote the projection of / onto the span of the first n basis elements. 
Define empirical wavelet coefficients 

n ri/n ~y n jj 

a k = Y Y i / <p j0) k(x)dx « - Y^Jo,k{xi)Yi rs a k + —j=Z k , 
i=1 J(i-l)/n n i=\ V n 

n ri/n ^ n q 

Pj,k = Y Yi ^j,k( X ) dx ~ _ X] ^jM X i) Y i ~ Pj,k + —j= Z i,k-> 

i=l J(i-i)/n n i=1 v n 
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where the Z k s and Zj^s are iid standard Normals. In practice, these coeffi- 
cients are computed in 0(n) time using the discrete wavelet transform. 

We consider two types of estimation: soft thresholding and modulation. 
The soft-threshold estimator with threshold A > 0, given by Donoho and 
Johnstone (1995a, 1995b), is defined by 

(3) a k = ak, 

(4) $ jik = B gn0 j)k )(\p 3 , k \-\) + , 

where a+ = max(a,0). 

Two common rules for choosing the threshold A are the universal threshold 
and the SureShrink threshold. To define these, let a 2 be an estimate of a 2 
and let p n = y / 21ogn. The universal threshold is A = p n a/^/n. The levelwise 
SureShrink rule chooses a different threshold Xj for the nj = 2 J coefficients 
at resolution level j by minimizing Stein's unbiased risk estimator (SURE) 
with estimated variance. This is given by 

a 2 Jl 

(5) Sn (\) = -2 J °+ ^S(Xj), 

3= Jo 



where 



(6) 



$(Aj) = £ 



k=l 



, a 



— - 2—1(1^1 < Xj} + wm(J3j tk , Xj) 



n 



for Jq < j < Ji. The minimization is usually performed over < Xj < p nj a/i/n, 
although we shall minimize over a smaller interval for reasons that are ex- 
plained in the remark after Theorem 3.2. SureShrink can also be used to 
select a global threshold by minimizing S n (X) using the same constant A at 
every level. We call this global SureShrink. 

The second estimator we consider is the modulation estimator given by 
Beran and Diimbgen (1998) and Beran (2000). Although these papers did 
not explicitly consider wavelet estimators, we can adapt their technique to 
construct estimators of the form 



(7) a k = ^a k , 

(8) 0j,k = £,jPj,k, 
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where £j ,£j +i, ■ ■ ■ are chosen to minimize SURE, which in this case 
is 



2 J 0-1 

Sn{0 = E 
k=0 

Jl 23-1 



n \ n 



(9) 



+ EE 

j=J k=0 



j=Jo 

Following Beran (2000), we minimize S n (£) subject to a monotonicity con- 
straint: 1 > ^ > £j > £j +i > " ' > £ Ji • We call this the monotone modula- 
tor, and we let £ denote the £'s at which the minimum is achieved. 

It is natural to consider minimizing S n (£), level by level [Donoho and 
Johnstone (1995a, 1995b)] or in other block minimization schemes [Cai 
(1999)] without the monotonicity constraint. However, we find, as in Be- 
ran and Diimbgen (1998), that the loss functions for these estimators then 
do not admit an asymptotic distributional limit which is needed for the con- 
fidence set. It is possible to construct other modulators besides the mono- 
tone modulator that admit a limiting distribution; we will report on these 
elsewhere. 

Having estimated the wavelet coefficients, we then estimate / — more pre- 
cisely, f n — by 



TP-l 



(10) 



fn(x) 



2 J «-1 Ji 

E "fc^Jo,^) + E E fij,k4>j,k{ x ) 

k=0 j=J k=0 



It will be convenient to consider the wavelet coefficients, true and es- 
timated, in the form of a single vector. Let fj, = (/xi, /X2, ■ • ■ ) be the se- 
quence of true wavelet coefficients (qo, ■ • ■ , a 2 J o-u ftjofii •■■■> Pj Q 2 J o-i> ■ • ■ )• 
The ctk coefficient corresponds to fi£, where t = k + 1 and (3jk corresponds 
to fj,e, where t = 2 3 + k + 1. Let (pi,(f)2, ■ ■ ■ denote the corresponding ba- 
sis functions. Because / 6 L 2 [0, 1], we also have that /i£ I 2 . Similarly, let 
/i n = (/ii , . . . , fj, n ) denote the vector of first n coefficients (ao, . . . ,a 2 J 

/5j ,0, • • • >/3j 0j 2 J 0-D • • • iPj lt 2 J l-l)- 

For any c > 0, define 



i- 



{/^ 2 :f> 2 <c 2 }, 



and let Bl q (c) denote a Besov space with radius c. If the wavelets are r- 
regular with r > q, the wavelet coefficients of a function / £ B^ q (c) satisfy 
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Mlp l9 < c , where 

(ii) «.,= ( E(2^ +(1/2) ^ (1/p)) (El^ 





Let 

(12) 7: 




P>2, 
1 <p< 2. 



We assume that p, q > 1 and also that 7 > 1/2. We also assume that the 
mother and father wavelets are bounded, have compact support and have 
derivatives with finite L 2 norms. We will call a space of functions / sat- 
isfying these assumptions a Besov ball with 7 > 1/2 and radius c and the 
corresponding body of coefficients with g < c a Besov body with 7 > 1/2 
and radius c. We use B to denote either, depending on context. If B is a 
coefficient body, we will denote by B m for any positive integer m, the set of 
vectors (fj,\ , . . . , fj, m ) for fj,&B. 

Our main results also extend to unions of Besov balls (and bodies). Fix 
77, c > 0, and define 

(13) T^= (J U ^M(c). 

p,q>lj>l/2+n 

The parameter rj is an increment of smoothness required only in the non- 
sparse case (p > 2). 

3. Confidence sets with a known. Here we give explicit formulas for the 
confidence set when a is known. The proofs are deferred until Section 7, and 
the a unknown case is treated in Section 4. It is to be understood in this 
section that a replaces a in (5) and (9). 

The confidence set is of the form 

(14) 2?n=|/i n :E(^-Ai) 2 <*n}- 

The definition of the radius s n is given in Theorems 3.1, 3.2 and 3.3. In each 
case we will show that 



(15) lim sup |P{^ n e£> n }-(l-oO|=0 

for a coefficient body B. Strictly speaking, the confidence set V n is for ap- 
proximate wavelet coefficients, but we show in Section 7 that the approxi- 
mation error can be easily accounted for. By the Parseval relation, T> n also 
yields a confidence set for f n . That is, 

(16) lim sup |P{/ n € A n } - (1 - a)\ = 0, 
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where 

(17) A n =^£ f i j </> j : ( j, n €V n j. 

Constructing the confidence set A n does not require knowledge of c or 7. 

At the cost of making an additional assumption, namely, an upper bound 
on the ball size, A n can be dilated slightly to produce a confidence set for 
/. Fix i],c > and recall the definition of T rhC from (13). Then the set 

(18) C n = \feF v ,c-- mf ||/-0|| 2 <4=}, 
for 8 > 0, satisfies 

(19) liminf inf P| / £ C n j > 1 - a. 

The factor 5/ yjn accommodates the difference between the true and approx- 
imate wavelet coefficients. The overcoverage of (18) occurs because one never 
really estimates /, rather, any data-based procedure is inevitably estimating 

fn- 



Remark 3.1. It is not surprising that sharp inferences are available for 
f n only. The difference between / and f n is effectively not estimable. In the 
context of kernel density estimation, Neumann (1998) and Chaudhuri and 
Marron (2000) argue that it is sensible to confine inferences to the smoothed 
version of the unknown density. 



Remark 3.2. The theorems that follow state that the confidence sets 
have correct asymptotic coverage over a Besov space B with 7 > 1/2. These 
results all hold replacing B by J-„ tC for any 77, c > 0. It is also worth noting 
that if p < 2, the results still hold with 7 = 1/2. 

Theorem 3.1 (Universal threshold). Suppose that f n is the estimator 
based on the global threshold A = p n a / 1 \J~rt. Let 

(20) S l = a 2^ + Sn (\). 

V n / 2 

Then (15), (16) and (19) hold for any Besov body B with 7 > 1/2 and radius 
c>0. 



We consider a restricted version of the SureShrink estimator where we 
minimize SURE over gp n a / 1 y/n < X < p n a/y/n, where g > l/\/2. 
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Theorem 3.2 (Restricted SureShrink). Let l/y/2 < q < 1. Ln the global 
case, let Aj = ■ ■ ■ = \ j 1 = A be obtained by minimizing S n (X) over gp n a / ' \fn< 
X < PnO~/\/n. In the levelwise case, let A = (Aj , . . . , AjJ be obtained by min- 
imizing S n (X Jo ,. . . , AjJ. Let 

(21) sl = a 2 ^= + S n (X). 

Then (15), (16) and (19) hold for any Besov body B with 7 > 1/2 and radius 
c>0. 

Remark 3.3. We conjecture that our results hold with only the restric- 
tion that g > 0. We hope to report on this extension in a future paper. 
Interestingly, the above theorem does not hold for g = because the asymp- 
totic equicontinuity of B n fails, so some restriction on SureShrink appears 
to be necessary. 

Remark 3.4. The theorem also holds with a data-splitting scheme sim- 
ilar to that used in Nason (1996) and Picard and Tribouley (2000), where 
we use one half of the data to estimate the SURE-minimizing threshold and 
the other half to construct the confidence set. In the case g > 1/V2 this is 
not required, but it may be needed in the more general case g > 0. 

Finally, we consider the modulation estimator. 

Theorem 3.3 (Modulators). Let f n be the estimate obtained from the 
monotone modulator. Let 

(22) 4 = f i^ + 5(a), 
where 

(23) f 2 = ^ £(2& - I) 2 + 4a 2 ± U\ - -) V " id\ 

where ^ is the estimated shrinkage coefficient associated with Then (15), 
(16) and (19) hold for any Besov body B with 7 > 1/2 and radius c> 0. 

4. Confidence sets with a unknown. Suppose now that a is not known. 
We consider two cases. The first, assumed in Beran and Diimbgen [(1998), 
equation 3.2], is that there exists an independent, uniformly consistent esti- 
mate of a. For example, if there are replications at each design point, then 
the residuals at these points provide the required estimator a. More gen- 
erally, letting £(•) denote the law of a random variable, they assume the 
following condition: 
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(51) There exists an estimate a 2 , independent of the empirical wavelet co- 
efficients, such that C(a 2 /a 2 ) depends only on n and such that 

ttin m{C{n l / 2 {a 2 Ja 2 - 1)), N(0, U 2 )) = 0, 

where m(-, •) metrizes weak convergence and 13 > 0. 

In the absence of replication (or any other independent estimate of a 2 ), 
we estimate a 2 by 

n 

(24) a 2 n = 2 £ Ai 

e=(n/2)+l 

which Beran (2000) calls the high-component estimator. We then need to 
assume that fj, n is contained in a more restrictive space. Specifically, we 
assume the following: 

(52) The coefficients [i of / are contained in the set 

{fie£ 2 (c):\\P j .\\ 2 <Cj,j>J2} 

for some c> 0, J2 > Jo and some sequence of positive reals ( = (Ci, (2, ■ ■ ■ ), 
where Q = 0{2~^ 2 ) and f3j. denotes the resolution-j coefficients. 

Condition (S2) holds when / is in a Besov ball B with 7 > 1/2. We note 
that such a condition is implicit in Beran (2000) and Beran and Diimbgen 
(1998) in the absence of (SI). 

Beran and Diimbgen (1998) construct confidence sets with a unknown by 
including an extra term in the formula for to account for the variabil- 
ity in <t^. This strategy is feasible for modulators since terms involving a 2 
separate nicely in the estimated loss from the rest of the data. In thresh- 
olding estimators the empirical process in Theorem 7.2 depends on a n in a 
complicated way, making it difficult to deal with a separately. We offer two 
methods for this case. For the soft-thresholded wavelet estimators it turns 
out that a plug-in method suffices. More generally, we can use a "double 
confidence set" approach. 

For both approaches we need the uniform consistency of a. 

Lemma 4.1. For any Besov body B with 7 > 1/2 and for every e > 0, 

>e\ ^0. 



(25) sup P 



a 2 



The proof of this lemma is straightforward and is omitted. 
In the plug-in approach we simply replace a by a in the expressions of 
the last section. 
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Theorem 4.1 (Plug-in confidence ball) . Theorems 3.1 and 3.2 continue 
to hold if a replaces a. For the modulation estimator Theorem 3.3 holds with 
f 2 replaced by 



n /i n V 

£(2&-l) 2 + 2Ua 4 ±£(2&-l) 

£=1 V n £=l / 

™ / rr 2 \ 2 
+ 4a 2 Efe--) (1-6) 2 - 



(26) 



In the double confidence set approach, the confidence set is the "tube" 
equal to the union of confidence balls obtained by treating a as known for 
every value in a confidence interval for a. We first need a uniform confi- 
dence interval for a. This is given in the following theorem; the proof is 
straightforward and is omitted. 

Theorem 4.2. Let 

-i 



(27) Qn = a 2 n , 
Under condition (SI) we have 

(28) liminf inf P{a £ Q n \ > 1 -a. 

n->oo <j>0 

Under condition (S2) uiii/i 15 = 2, we Ziaue, /or any Besov body B with 7 > 
1/2, 

(29) liminf inf P{a G Q n } > 1 - a. 

n->oo /J eB, (J >0 

Theorem 4.3 (Double confidence set). Let 5 = 1 — \/l — a i/ (SI) aoZcfe 
and let a = a/2 if (S2) holds. Let Q n be an asymptotic 1 — 5 confidence 
interval for a, as in Theorem 4.2. Let 

(30) P n = |J V n>a , 

where T> n a is a 1 — 5 confidence ball for [i from the previous section obtained 
with fixed a. Then 

(31) liminf inf P|u n G £> n j > 1 - a. 

Finally, under condition (SI) or (S2), Theorems 3.1, 3.2 and 3.3 continue 
to hold with (31) replacing (15) and 2? n as m (30). 
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5. Confidence sets for functionals. Let / i— ► /* be the operation that 
takes / to the approximation defined in (44). The reader can think of /* as 
simply the projection f n of / onto the span of the first n basis functions. 
Define C* to be the set of /* corresponding to coefficient sequences f/ 1 G V n . 
For real-valued functionals T, define 

(32) J£(T)= ( inf sup r(/*)Y 

We then have the following immediately from the asymptotic coverage of 
the confidence set. 



Lemma 5.1. Let 7 be a set of real-valued functionals on a Besov ball B 
with 7 > 1/2 and radius c > 0. Then 

(33) Inn inf inf P{T(/*) G J*(T) for all T G 7} > 1 - a. 

We can extend the previous result to include sets of functionals of slowly 
increasing resolution. Let T be a function class and let 7 n be a sequence 
of sets of real- valued functionals on T . Define the worst-case approximation 
error over T and 7 n by 

r n (F,7 n )= sup sup|T(/)-r(/*)|. 
TeT n /ejc- 

For a sequence iu n , define 

(34) J n (T) = ( inf T(/*) - u, n , sup T(f*) + . 

Theorem 5.1. For a function class T and a sequence 7 n of sets of 
real-valued functionals on T , if w n > r n (J-,7 n ), 

(35) liminf inf,P{T(/) G J n {T) for all T G 7 n } > 1 - a. 
Proof. Follows from the triangle inequality and Lemma 5.1. □ 



Remark 5.1. If the functionals in 7 n are point evaluators T(f) = f(x), 
then the confidence sets above yield confidence bands. 

For a given compactly-supported wavelet basis, define the integer k to 
be the maximum number of basis functions within a single resolution level 
whose support contains any single point: 



K = sup{#{ip jk (x) / : < k < 2 j } : < x < l,j > J }. 
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Note also that HV'jfclli = 2 — J ' 2 ||^||i. Both k and ||^||i are finite for all the 
commonly used wavelets. 

As an example, we consider local averages over intervals whose length 
decreases with n. 

Theorem 5.2. Fix a decreasing sequence A n > and define 

T n = ! [ T:T(f) = -^— £ fdx,0<a<b<l,\b-a\ > A n \. 

Fix rj,c > and let T*n C be the union of Besov balls defined in (13). 

If the mother and father wavelets are compactly supported with k < oo and 
|| "0 ||i < oo and if A" 1 = o(n^/ (log n)^J ) for some < £ < 1, then 

(36) r n {F ViC ,T n ) = o(^-V(logn) LCJ ). 

Hence, for any sequence w n > that satisfies w n — > and liminf^^oo w n n l ~^ x 
(logn)LCJ >0, 

(37) liminf inf P{T(f ) G J n (T) for all T G T n } > 1 - a. 

6. Numerical examples. Here we study the confidence sets for the zero 
function fo(x) = and for the two examples considered in Beran and Diimbgen 
(1998). We also compare the wavelet confidence sets to confidence sets ob- 
tained from a cosine basis as in Beran (2000). 

The two functions, defined on [0, 1], are given by 

(38) ji(x)=2(6.75) 3 x 6 (l-x) 3 , 

1.5, if0<x<0.3, 
if 0.3 < x < 0.6, 
if 0.6 < x < 0.8, 
otherwise. 

Tables 1 and 2 report the results of a simulation using a = 0.05, n = 1024, 
a = 1 and 5000 iterations (which gives a 95% confidence interval for the 
estimated coverage of length no more than 0.025). For comparison, the radius 
of the standard 95% x 2 confidence ball, which uses no smoothing, is 1.074. 
We used a symmlet 8 wavelet basis, and all the calculations were done using 
the S+Wavelets package. 

7. Technical results. Recall that the model is 

Yi = f{xi) +a£i, 

where the £j ~ -/V(0, 1) are iid and f{x) = J2j V"j4>j{ x )- Let Xj denote the 
empirical wavelet coefficients given by 

n ri/n 

Xj = y^Yi / 4>j(x)dx. 

H J(i-l)/n 



(39) f 2 (x) 




i=l 
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Table 1 

Coverage and average confidence ball radius, by method, in the 
a -known case. Here n = 1024 and a = 1 



Method 


Function 


Coverage 


Average radius 


Universal 


fo 


0.951 


0.274 




fi 


0.949 


0.299 




h 


0.935 


0.439 


SureShrink (global) 


fo 


0.946 


0.270 




h 


0.941 


0.292 




h 


0.937 


0.401 


SureShrink (levelwise) 


fo 


0.944 


0.268 




h 


0.940 


0.289 




h 


0.927 


0.395 


Modulator (wavelet) 


fo 


0.941 


0.258 




h 


0.940 


0.269 




h 


0.933 


0.329 


Modulator (cosine) 


fo 


0.931 


0.253 




h 


0.930 


0.259 




h 


0.905 


0.318 



Then X n = (X±, . . . , X n ) are multivariate Normal with 

2 

(40) EXj =^- + 0(1/71), VarX j = ^- + 0{l/n 2 ), 

uniformly over B [Donoho and Johnstone (1999)], where ~p n g = J f n <fie- The 
Xj's are asymptotically independent. 

That the X^s are asymptotically independent poses no problem. Using 
the orthogonal discrete wavelet transform to define the empirical wavelet 
coefficients yields X n that are exactly independent. Donoho and Johnstone 
(1999) show that the means and variances of X n and X n are close. From 
this, it follows that the Kullback-Leibler distance — and, hence, the total 
variation distance — between the law of ^/n{X n —~p. n ) and a iV n (0, a 2 1) tends 
to uniformly, where ~p n = (/T^, . . . ,JI n ). In what follows, we may thus assume 
the Xj are independent Normal (/Ij, a 2 jn). 



Table 2 

Coverage, by thresholding method, in the o-unknown case using the 
Plug-in Confidence Ball. Again n = 1024 and a = 1 



Function 


Universal 


Sure GL 


Sure LW 


WaveMod 


CosMod 


fo 


0.961 


0.955 


0.954 


0.955 


0.999 


fi 


0.963 


0.955 


0.953 


0.961 


0.999 


h 


0.938 


0.940 


0.929 


0.951 


0.997 
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It will be helpful to introduce some notation before proceeding with 
the ensuing sections. Let a 2 = a 2 /n and define r n = p n a/y^n, where p n = 
y/2 logra. Also define v n { = —y/npi/a, and let a n i = u n i — up n and b n i = 
v n i + up n . Note that y/nXi/a = £% — v n i- Define 

I ni {u) = ±{\Xi\ < ur n } = ±{v ni - up n <£i< v ni + up n } = ±{a ni <£i< b ni }, 

I+(u) = ±{Xi > ur n } = l{Si > v ni + up n } = ±{e >b ni }, 

I~i(u) = t{Xi < -ur n } = t{ei < v ni - up n } = ±{e < a ni }, 

Jni(s, t) = ±{sr n <Xi< tr n } = ±{v ni + sp n < £j < V ni + tp n }. 

For < u < 1 and 1 < i < n, define 

Z n i(u) = Vn[(Xi - ur n )t{Xi > ur n } + (X; + ur n )t{Xi < -ur n ] - pi] 2 
yfrW* - 2a 2 n t{X 2 < u 2 r 2 n } + mm(X 2 ,u 2 r 2 n )} 



2 

n ' 

+ 2u ni £il ni (u) - 2up n e i (I+ i (u) - I~{u))}. 
(41) 

Each Z n i represents the contribution of the ith observation to the pivot 
process and satisfies EZ n i (u) = for every < u < 1 . We also have that 

4 

(42) Z ni(u) = ^"[(e? " I) 2 + ^IdlniW) + 4u 2 p 2 t £ 2 (l - Ini{u)) 

- ±v ni E t (e 2 - l)I ni (u) - 4up n £i(£ 2 - - I-(u))}. 

The relevance of these definitions will become clear subsequently. Through- 
out this section C' denotes a generic positive constant not depending on n, 
/lore, that may change from expression to expression. 

7.1. Absorbing approximation and projection errors. As noted in the 
statements of Theorems 3.1, 3.2 and 3.3, the confidence set C n for ~p n in- 
duces a confidence set for / uniformly over Besov spaces. In this section we 
make this precise. 

Define 

ra ri/n 
fn( x ) = n Z~2 1 l(i-1)/n,i/n](x) / f(t)dt 

(43) J(l - 1)/n 



1=1 



and its projection 

n 

(44) f*(x) = Y,J^n<Pj(x). 
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Theorem 7.1. Fix c, i] > 0. Let J-^c be the body corresponding to T^ c . 
Let V n be defined by (14) and suppose that 

liminf inf P{/x n e £>„} > 1 - a. 

Lei C n 6e defined as in (18). T/ien 

(45) liminf inf P{/ € C n } > 1 - a. 

Proof. From the results in Brown and Zhao (2001), it follows that 
11/ - /n|| 2 and II In ~ /nil 2 = Y.f=n+i fij are both bounded, uniformly over 
BpJc), by (Clogn)/ra 27 for some C > not depending on p, g or q. It then 
follows that, for any / S T^ C i 

ii/-£ii!<*(ii/-/nii 2 +ii/n-£ii 2 ) 2 

Clogw _ 2 
- „l+2n 



~2 _ 2 , c _ rz a , r , o 
s n — s n ' "n — 7= T" "n ~r >->n> 



Let 



where 6 n = 6 log n / -y/n for any fixed, small 5 > . Let W 2 = || f n — /* ^ . Then 

\\L ~ Ml = Un ~ £ll! + ll/n - U\l = W 2 + k* 

and 

11/ " fnh < 11/ " /nib + ||/n " £|| 2 < W n + fc n 

uniformly over !F VC . Hence, 

P{|l/n " fn\\l > Si) < P{W 2 > s\ - k 2 } 

= P{W 2 >s 2 n + S n -k 2 n }. 
Now, liminf^^oo 5 n — k\ > and so 

limsup sup P{VF 2 > s 2 + 5 n — /e 2 } < limsup sup P{W 2 >s 2 l }<a. 

n^oo feF v ,c n ^°° f&Fi,,c 

To do the same for / we note that 

ll/n " /111 = ||/n - £11! + 11/ - + 2(fn ~ £, £ " /) 

= ll/n — /nib + 11/ ~~ /nib + 2(/n — £j £ ~~ /n) 



fn - £111 + 11/ - £ll! + 2 E(Ai - - U) 



i=l 
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< win - mil + 11/ - m\i + nu - tihwu - /*n 2 

= (\\fn-m\2 + \\fn-m\2) 2 + \\f-fn\\l 

< (W n + k n f + k 2 n , 

where the last inequality follows from the results in Brown and Zhao (2001) 
since ||/ n - /*|| < ||/ - f n \\. We have 

P{||/ " All > 4} < P{(W n + k n f > S 2 n } 

= P{(W n + k n ) 2 >s 2 n + 5 n } 
= P{W 2 + 2k n W n + k\ > s 2 n + 5 n } 
< P{W 2 > 4} + ?{2k n W n + k 2 n > S n }. 
The limsup of the first term is bounded above by a. For the second term, 



limsup sup P{2k n W n + k\ > S n } 

&n ~ k\ 



lim sup sup P < W n > 



n—*oo 



2/lt-i 



„( rir Sn 1 " 1 y/Clogn] 
hm sup sup P < Wn > — 1= /n \ , r 



Hence, limsup^^ P{||/ - / n || > s 2 n } < a. □ 

7.2. The pivot process. In the rest of this section, for convenience, we 
will denote flj simply by fij. We now focus on the confidence set T> n for /i n 
defined by 



i=i 



Our main task in showing that T> n has correct asymptotic coverage is to 
show that the pivot process has a tight Gaussian limit. See van der Vaart 
and Wellner (1996) for the definition of a tight, Gaussian limit. 

For i = 1, . . .,n, let denote the resolution level to which index i be- 
longs, and for j = Jo, . . . , J±, let Ij denote the set of indices at resolution 
level j, which contains nj = 2 3 elements. Let t be a sequence of thresholds 
with one component per resolution level starting at Jo, where each tj is in 
the range [Qp n o~ n , Pn&n]- It is convenient to write t = up n a / ' \fn, where u is a 
corresponding sequence of values in [q, 1]. In levelwise thresholding, the ij's 
(and Ujs) are allowed to vary independently. In global thresholding, all of 
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the tj's (and Ujs) are equal; in this case, we treat t (and it) interchangeably 
as a sequence or scalar as convenient. 

The soft threshold estimator jl is defined by 

(46) jk(t) = (Xi - t m )t{Xi > t m } + (Xi + t m )t{Xi < -t m }, 
for i = 1, . . . ,n. The corresponding loss as a function of threshold is 

n 

Ln{t) = $^(Ai(f) - mf '. 
i=l 

We can write Stein's unbiased risk estimate as 

n 

(47) S n (t) = _ 2a 2 n l{Xf < t 2 m }+mm(Xf,t 2 m )) 

i=i 

Ji 

( 48 ) = E E (*n " 2^1{X? < t, 2 } + min(Xf, t|)) 

j= Jo iez, 

(49) = J] Snjitj). 

j=Jo 

In global thresholding, we will use the first expression. In levelwise thresh- 
olding, each S n j is a sum of rij independent terms, and the different 5 n j's 
are independent. 

The SureShrink thresholds are defined by minimizing S n . By indepen- 
dence and additivity, this is equivalent in the levelwise case to separately 
minimizing the S n j(tj)s over tj. That is, recalling that r n = p n a/y/n, 

(50) u n = argmin5 n (n) and t n = u n r n (global), 

g<u<l 

(51) u n j = argmin 5 nj (uj) and t n j = u n jr n (levelwise). 

We now define 

(52) B n (u) = Vn(L n (w n ) - S n (ur n )). 

We regard {B n (u):u G U e } as a stochastic process. Let g > l/y/2. In the 
global case we take li e = [g,l]- In the levelwise case we take U = [g, 1]°°, 
the set of sequences (iti, . . . 1, 1, . . . ) for any positive integer k and any 
g < Uj < 1. This process has mean zero because S n is an unbiased estimate 
of risk. The process B n can be written as 

n 

(53) B n (u) = Y^Zni(uj(i)), 

i=i 
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where Z n i is defined in (41). For levelwise thresholding, B n (u) is also additive 
in the threshold components: 

Ji Ji 

(54) B n (u) = £ B nj ( Uj ) = E E Z ni(uj). 

j=Jo j=Jo 

Each B n j is of the same basic form as the sum of nj independent terms. 

Lemma 7.1. Let B be a Besov body with 7 > 1/2 and radius c> 0. The 
process B n (u) is asymptotically equicontinuous on U e uniformly over p £ B 
for any g > l/v2 with both global and levelwise thresholding. In fact, it is 
uniformly asymptotically constant in the sense that, for all 5 > 0, 

(55) limsupsupP*< sup \B n (u) — B n (v)\ > 5 > = 0. 

rwoo ^(=6 lu,v&U Q J 

Proof. As above, let a n i = v n i — up n and b n i = u n { + up n . From (41) we 
have, for < u < v < 1, 



2 ^(Z„j(u) - Z ni (v)) 

= (ej - l){I ni (v) - I ni {u)) - V n i£i{Ini{v) ~ I n i{u)) 

- upnei(I+(u) - I~(u)) + vp n £i(I+(v) - I~(v)) 

= {ef - l)t{up n < \Ei - V ni \ < vp n } - V ni £i\{up n < \Ei - V ni \ < Vp n } 

- Up n £it{up n <Si- V ni < VPn} + Up n £i±{-Vp n < £j - V ni < ~Up n } 

+ (v- «)/>„,£,(/+ (v) - I~i(v)) 

= (ef - l)±{up n < \£i - U ni \ < up n + (v - u)p n } 

- b ni eit{b ni <£i< b ni + (v- u)p n } 

- a ni £i±{a n i - (v - u)p n <Ei< a ni } 

+ (v- u)p n ei [l{Si > b ni + (v ~ u)p n } ~ t{ei < a n i ~ (v - u)p n }\. 

(56) 

From (56) we have that 



^\Z n i(u) - Z n i(v)\ 

< ( e i + {Wni\ + u Pn)\ti\ + l)±{lip n < \Si - V n i\ < vp n } 
+ \v- u\p n \£i\±{\£i - V n i\ > vp n } 

(57) < {£ 2 i + \Vni\\£i\ + l)t{up n < \£i ~ V ni \ < Vp n } 

+ Pn\£i\t{\£i ~ V n i\ > Up n } 

< (e| + \v n i\\£i\ + l)±{QPn < \Si ~ Vni\ < Pn} 
+ p n \£i\l{\£i ~ V n i\ > QPn} 

— ^ni- 

Let 

AiO = {l<i<n: \v n i\ < 1}, Ail = {l<i<n:l< \u ni \ < 2p n }, 
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and 

A n 2 = {l<i<n: \v ni \ > 2p n }. 

Let A = A n ,i U A n ,2, the set of i such that \v n i\ > 1. Let no be the car- 
dinality of A. Let (3 = 27 and note that f3 > 1 since 7 > 1/2. The Besov 
condition implies the following: 



z' 3 



1=1 

(58) >xy>fy 

•ie.4 i=i 
> C 2 nl + ?, 

where the last inequality holds for large enough no- It follows from (58) that 

(59) n (n)<Cn 1 /( 1+2 ^p2/(i+27) 5 

which is o(y/n). 

From the above, we have in the global thresholding case that 

sup \B n (u) - B n {v)\ 

S<u<v<l 

n 

< SUp \ Z ni{u) ~ Z ni (v)\ 

(60) Q<u<v<l i=1 

2a 2 n 

< "^£[( e i + Ki|M + l)HePn < N ~ V n i\ < Pn} 



n . 
i=i 



+ p n \£i\l{\ei - V ni \ > Qp n }]. 



We break the sum Ya=i into three sums, J2ieA n0 + £;e.4„i +Eie^„ 2 > and 
consider these one at a time. 

For the case where \v n i\ < 1, we have the following: 

2a 2 

—j= £ I( £ i + KilM + l)l{^n < \Si ~ Vni\ < Pn} 

v n ieA na 

+ p n \ei\l{\ei - v ni \ > gp n }] 

2a 2 

^ ~r £ & + ( x + + x ) ^ BPn - l}. 

V «S.4n0 

Let t n = gp n — 1. By (72) and (73), the expected value of each summand is 

E(ef + (1 +p n )\£i\ + l)l{\£i\ > QPn ~ 1} 
= 2{t n + Pn + l)(t>{t n ) + 4(1 - $(t n )) 
= o{n- 1 ' 2 ). 
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The entire sum thus goes to zero as well. To see the last equality, note that 
there exists 5 > such that 

*" ) = ^ e>tp {-^} = 7S e " c2p " /V "" 

n {V2e/^/^)- Q 2 = ,-1/2-8} 



because X^g_ — n 2 < —1/2 — 5 for large enough n, where 5 = \p 2 — 1/21/2. 

V log 71 

It follows that p n (j)(t n ) = o(n _1 / 2 ), and similarly for (1 — ~ (p(t n )/t n . 

For the case where 1 < \v n i\ < 2p n , we have the following: 

2a 2 

—j= Y K e f + KilM + l)H6Pn < ki- "ni\ < Pn} 

+ p n |£j|l{|£j - U ni \ > Qp n }} 

2a 2 



v n i6.A n i 



The expected value of each summand is bounded by 2 + 3p n . The expected 
value of the entire sum is thus bounded by 



n 



2a 2 {2 + 3 Pn )^0, 



because no(n)p n /-^/n — > 0. 

For the case where 2p n < \ u n i\, we have the following from (57): 

2cr 2 

-1= Y t( £ i + \ u ni\\ £ i\ + l)HePn < N ~ v ni\ < Pn} 

V l£An2 



+ p n \Ei\t{\ei - V ni \ > Qp n }] 



2a 2 



< \ Y {£i+ 2 Pn\£i\ +1)+ Y {\ v ni\- Pn)\£i\t{\£i\>\v ni \- p n } 



n 



The expected value of the summands in the first term is bounded by 2 + 2p n . 
The expected value of the summands in the second term is bounded by 
2(|^ni| — Pn)<P{Wni\ — Pn)- Hence, the expected value of the entire sum is 
bounded by 

-2fJ 2 (2 + p n + 2(\u ni \ - p n )4>(Wni\ ~ Pn)) ~> 0, 



in 

because 7 > 1/2 implies no(n)p n /^/n^ 0. 
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We have shown that Esup £1<n< „ <1 \B n (u) — B n (v)\ — ► 0. The result follows 
for all 5 > by Markov's inequality. 

Next, consider the levelwise thresholding case. The product space lA e = 
[g, 1]°° is the set of sequences (u±, . . . , v,)., 1,1,...) over positive integers k 
and g < Uj < 1. By Tychonoff's theorem, this space is compact and thus 
totally bounded, soU e is totally bounded under the product metric d(u,v) = 
Y^=j 2~ e \ut-v t \. For ueU°°, define 

n 
i=l 

It follows then that, for any u,v£ U°°, d(u,v ) < 1 — g and 

n 

(61) \B n {u) - B n (v)\ < Y} z m(uj(i)) - Z ni {v j(i) )\ 

1=1 

n 

(62) SU P \ z m(u) - Z ni (v)\ 

n 

(63) <E A «- 

i=i 

where A n j is the «, v independent bound established above in (57). The 
result above shows that 

(64) E sup \B n (u)-B n (v)\^0. 

u,v£U e 



This implies that B n is asymptotically constant (and thus equicontinuous) 

Q- 

□ 



on U g 



Lemma 7.2. Let B be a Besov body with 7 > 1/2 and radius c > 0. 
For any fixed ui,...,Uk in either global or levelwise thresholding, the vec- 
tor (B n (ui ),..., B n (uk)) converges in distribution to a mean zero Gaussian 
on R fc , uniformly over fi G B, in the sense that 

supm(C(B n (ui), . . .,B n (u k )),N(0,T,(ui,.. . ,u k ;(J,))) -> 0, 

MSB 

where m is any metric on M. k that metrizes weak convergence and where S 
represents a limiting covariance matrix, possibly different for each ji. 



Proof. We begin by showing that the Lindeberg condition holds uni- 
formly over fj, G B and over < u < 1 . 
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First consider global thresholding. Define \\Z n i\\ = sup 0<u<1 \Z n i(u)\ . Re- 
call that EZ n i = for all n and i. Now by (41) and (42), 

2<t 4 

Z 2 m {u) < — [{e 2 - l) 2 + 4u 2 p 2 n s 2 (1 - I ni {u)) + Av 2 m e 2 I ni {u)\ 

= Nl+N 2 + Ng. 

Note that none of Ni,N 2 or depends on u. Hence, 

\\Z ni \\ 2 t{\\Z ni \\>r]} 

, <(Hi+H 2 + H 3 )l{(Hi+H2 + K 3 )>r / 2 } 

(65) 3 3 

r=l s=l 

where J s = 1{H S > ?? 2 /3}. We will now show that the nine terms in (65) are 
exponentially small in n, which implies that the Lindeberg condition holds. 
First, 

P U > t\ = p(| £ 2 _ !| > Jb£j\ < 2exp ( 



using the fact that P{|xi - 1| > *} < 2e _t(M1)/8 . To bound N 2 , we use Mills' 



ratio: 

,2 



P(H 9 > 2_\ < p/l^l > V) < 2 ^e-^/(96^) = 2 PV^ e -n,V(96^) 

I 3 J I crr n V48J 7? iJV" 

For the third term, if /ij = 0, = 0. If /ij 7^ 0, 

p{«3>f}<p({W< r „}n{ £ ?>J^}).6fe). 

An elementary calculus argument shows that b(pi) < 6(/U*), where 

1 Pncr 1 /p 2 o- 2 4r? 
AM ~~ r, ^- + -( h 



2^/n 2y n y/48n' 
Now, for all large n, 

< P{e > -pn<? + v^Iam 1 } 

< P ( e> !^Vg\< 6 -ws/rc 
" I 6 J " ^^W/* 

These inequalities show that, for 77 > and for s = 1,2,3, EJ S .; < x 



exp(— i^min^,?? 2 )-^). Because J Ett 2 j < K 3 /n, y Ett 2 ^ < p^Ki/n and y Ett 2 j 
p 2 K§, the Cauchy-Schwarz inequality and (65) show that, for 77 > 0, 

n 

(66) E£ ||Z ni || 2 l{||Z n j|| > r]} < K 6 (a,p,c)exp(-K 7 (a,p)mm(i],i] 2 )y/n). 
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Here the constants Kj depend, at most, on a. It follows that the Linde- 
berg condition holds uniformly by applying the Cauchy-Schwarz inequality 
to (65). 

Write B n (u) = B n .p(u) to emphasize the dependence on /i and similarly 
for Z n i- >fH (u). In particular, let B n] o(u) denote the process with p,\ = \xi = 
• • • = 0. Let C n -n(u) denote the law of B n -^{u) = Y^7=i Zni;ni( u ) an d let M 
denote a Normal with mean and variance 2. By the triangle inequality, 

m(C n;ll (u),N) < m(C n;0 (u),M) +m(£ n;/i (it),£ n;0 (tt)), 

where m(-, •) denotes the Prohorov metric. By the uniform Lindeberg condi- 
tion above, the CLT holds for C n -o(u) and, hence, by Theorem 7.3, m(C n -o(u),N) 
0. Now we show that 

(67) sup m(C n;iJ ,(u),£ n; o(u)) ->• 0. 
Note that 

2^\^ni;iM( u ) ~ Z ni;o( U )\ 

= \(ej- l)(Ini;m(u) - Ini;o(u)) + V n i£ilni;ii x {u) 

- UpnEiKl+.^u) - 2^. («)) - (Ini;Hi( U ) ~ J ni;o( U ))]|- 

This can be bounded as in the proof of Lemma 7.1 and the sum split over 
the same three | < 1, 1 < \v n i\ < 2p n and \v n i\ > 2pn- It follows that 

(68) supE sup \B n]IJL (u) - B n;0 (u)\ < a 2 n , 
fi&B e<u<i 

where a n — > 0; note that a n does not depend on u or p. Therefore, 

a 2 

sup sup P\B n: ^(u) - B n . (u)\ > a n < — = a n 
n&B g<u<i o, n 

for all large n. Recall that, by Strassen's theorem, if P{\X — Y\ > e} < e, 
then the marginal laws of X and Y are no more than e apart in Prohorov 
distance. Hence, 

(69) sup sup m{£ n]p ,(u),C n -fl{u)) <a n — ► 0. 
neBg<u<i 

This establishes the theorem for one u. When B n (u\, . . . , Uk) is an M fc -valued 
process for some fixed k, 

E||-B„ ;/Lt (ui,.. .,u k ) - B n . (ui, . . .,u k )\\ 

(70) < kE sup \B nr -^(u) — B nr -o(u)\, 

g<u<l 

so by (68) the sup of the former is bounded by ka\. Since k is fixed, the 
result follows. Thus, (67) holds for any finite-dimensional marginal. 

The same method shows that the result also holds in the levelwise case. 

□ 
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Theorem 7.2. For any Besov body with 7 > 1/2 and radius c > and 
for any 1 / v2 < g < 1 , i/iere is a mean zero Gaussian process W such that 
B n ~+ W uniformly over /i£ B, in the sense that 

(71) supm(C{B n ),C(W))^0, 

where m is any metric that metrizes weak convergence on £°° [g, 1] . 

Proof. The result follows from the preceeding lemmas in both the 
global and levelwise cases. Lemmas 7.3 and 7.2 show that the finite-dimensional 
distributions of the process converge to Gaussian limits. Lemma 7.1 proves 
asymptotic equicontinuity. It follows then that B n converges weakly to a 
tight Gaussian process W. □ 

7.3. The variance and covariance of B n . Recall that r n = p n a/^/n, v n i = 
-■s/nfii/a, a ni = v ni - up n and b ni = v ni + up n . Also define 

(72) D 1 (s,t)= f e<j)(e)de = s<j){s)-t<j)(t), 

J s 

(73) D 2 (s, t) = J* e 2 <t>{e) de = s(/>(s) - t<f)(t) + $(t) - $(s), 

(74) D 3 (s, t) = f e(e 2 - l)0(e) de = (s 2 + 1)000 - (t 2 + l)0(i), 



D 4 {s,t)= f {e 2 -l) 2 cp{e)de 



(75) 



2(*(t) - $(s)) + s(s 2 + l)<P(s) - t(t 2 + l)0(t). 
Let K n (u,v) = Cov(B n (u),B n (v)). It follows from (42) that 
K n (u,u) = EZ^(u) 

= [1 + 2^1)2(0^,^0 + 2n 2 /9^(l - D 2 (a n i,b ni )) 

n 

) + 2up n (D-s(— 00, a n i) — Z>3(6 n j,oo))] 

2a 4 



2 „2 

- 26 ni (a^ + l)0(oni) + 2a ni (b 2 ni + 1)0(6^)] 
2a 4 

[1 + 2u 2 p 2 n + 2a ni 6 ni ($(6 ni ) - $(a ni )) 

n 

+ 2 &niOni^( a nj) ~ ^ a nib 2 ni (t>{b n i) 

- 2b ni (a 2 ni + l)0(a ni ) + 2a ni (6 2 . + l)0(ft ni )] 



[l + 2u p + 2a ni b n iD 2 (a n i,b n i) 

n 
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2a 4 

[1 + 2u 2 p 2 n + 2a ni b ni (<S>(b ni ) - $(a ni )) 

n v / 



11 



+ 2a ni (j)(b n i) - 2b ni (p(a n i) 

2a 4 



■[1 + 1 + II]. 



Theorem 7.3. Let B be a Besov ball with 7 > 1/2 and radius c > 0. 
Then, 



lira sup 



£E^(«)-2a 4 

i=l 



0. 



Proof. Apply Lemma 7.4 to the sum of terms / and II. This is of the 
form ^EitiPnO™), where 

g n (x) = 2u 2 p 2 n + 2(x 2 - u 2 p 2 n )(${x + u Pn ) - <S>(x - up n )) 

+ 2{x - up n )4>(x + up n ) - 2(x + Up n )(j){x - Up n ). 

We have g n (0) — > because |g n (0)| < Qp n n~ e , and, hence, n > 288/e implies 
that \g n (0)\ <e. 

Now, if \x\ > 2p n , then by Mills' inequality |<7 n (^)| < Cp 2 n . If \x\ < 2p n , 
the same holds because each term is of order p 2 n . Hence, HflViHoo = O(logra). 
For x in a neighborhood of zero, 

\g n {x) -g n (0)\ < |s4(0IM for some l£l < \x\ 



Hence, 



< sup 

\t\<\*\ 



snp\g n (x) -g n {0)\ < |x|sup sup \g' n (Q\. 

n n |£|<| x | 



By direct calculation, for e > and 5 = min(e, 1/8), sup^^i [<7^(£)l — 1) 
so \x\ < 5 implies sup n \g n (x) — 3n(0)| < e. Thus, (g n ) is an equicontinuous 
family of functions at 0. 

By Lemma 7.4, the result follows. □ 

Lemma 7.3. Let B be a Besov body with 7 > 1/2 and radius c > 0. Then 
the function K n (u,v) = Cov(B n (u), B n (v)) converges to a well-defined limit 
uniformly over p € B: 

lim sup I K n (u, v) — 2a 4 1 = 0. 
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Proof. Theorem 7.3 proves the result for u = v. Let < u < v < 1. 
Then by (41), 

4 

Z ni (u)Z m (v) = — (e 2 - 1) 2 (1 - 2/ ni («) + 2I ni (u)) 
n 

+ 2 ~J=^ 2 ~ l){w„J fti («) + ur n I ni (u) - m 

+ 2ur n J ni (u,v) - 2ur n J ni (-v,-u)} 
+ 2a 2 e 2 {2 / u 2 / ni ('u) + 2ur n (p + r n v)J ni (u, v) 
+ 2-ur n (t;r n - fiv)J ni (-v, -u)}. 
Let o n j = f n j — up and 6 n j = z/ n i + up. We then have 
K n (u,v) = E(Z ni (u)Z ni (v)) 
2a A 

= [1 - Di(d n i,b n i) + Di(a n i,bni) 

n 

- vp n D 3 (-oc,a ni ) +up n D 3 (-oc,a ni ) + u ni 

- vp n D 3 (b n i, oo) - up n D 3 (b ni , oo) - 3v ni D 3 (a ni , b ni ) 

- 2up n D 3 (b ni , b ni ) ~ 2up n D 3 (a 

nil Q"ni ) 

+ 2h? i D2{a ni ,b n i) + 2uvp n (p n - U ni )D 2 (b n i, b ni ) 

+ 2uvp n (pn + Vni)D2(a ni ,a ni )]. 

The proof that this converges is essentially the same as the proof of Theorem 
7.3. 

□ 

Lemma 7.4. Let B be a Besov ball with 7 > 1/2. Let g n be a sequence 
of functions equicontinuous at 0, with ||<? n ||oo = 0((logn) Q ) for some a > 0, 
and satisfying g n (0) ->«£!■ Then 

1 n 

lim sup - V g n {^i\fn ) = a. 

Proof. Without loss of generality, assume that a = 0. Let M n = ||g n ||oo- 
Fix e > 0. By equicontinuity, there exists 5 > such that \x\ < 5 implies 
\9n(x) — g n (0)\ < e/4 for all n. By assumption, there exists an N such that 
I'M (0)| < e/4 for n> N. Since B is by assumption a Besov ball, there is a 
constant C such that, for all n, Ya=1 Vh,^ — C 2 k)gn, for all p G B. See Cai 
[(1999), pages 919 and 920] for inequalities that imply this. 
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Let v n i = fj,iy/ri. The condition on f/, implies for all n that 

n 

E^ 27 <C 2 nlogn. 
i=l 



Let the set of such u n s be denoted by B n . We thus have 



sup 



1 n 



n . 

i=i 



1 n 



< sup -^2,\g{u, 

Vn&Bn i=l 



Let 

n = [C7 1 ^n 1/27 (log?i) 1/27 /5 1/7 l . 

This is less than n and bigger than N for large n. Then for i > no and n> N, 
\vni\ < ^ and \g n {vni)\ < e/2. We have 

n r-f n 2 n 

^ -(l-l/2-y)/i \l/27C 1//7 -^ri , £ 

<n 1 1/Z7; (logn) 1/Z7 -5 h-. 

Z 2 

Thus, as soon as 

n(logn) /l7 ' > I ^maxll — I I 

we have 

1 n 

SUp |flw(*>m)| < £ i 

u n eB n n i=i 
which proves the lemma. □ 

7.4. Proofs of main theorems. 

Proof of Theorem 3.1. This follows from Theorems 7.2 and 7.3. The 
last statement follows from Theorem 7.1. □ 

Proof of Theorem 3.2. This follows from Theorems 7.2 and 7.3 and 
the fact that B{u) = B{1) + op(l) uniformly in g < u < 1, and u 6 B. The 
last statement follows from Theorem 7.1. □ 



Proof of Theorem 3.3. This follows from Theorem 3.2 in Beran and 
Diimbgen (1998) and Theorem 7.1. □ 
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Proof of Theorem 4.1. Let m = a /a. The pivot process with a 
"plugged in" is 



B n (u) = y/riY^[(m ~ fj.i{ur n m)) 2 
i=X 

+ [ma 2 - 2ma 2 t{X 2 < u 2 r 2 m 2 } + mm(X 2 ,u 2 r 2 m 2 )}] 
a 2 n 

B n (um) + (m 2 - 1)— V(l - 2{\(5^ k \ < ur n m}) 



n i=l 
= B n (u) + o P (l), 

uniformly over uElA Q and /i G B, by Lemmas 7.1 and 4.1. The result follows. 

□ 

Proof of Theorem 4.3. Let fio and <7o denote the true values of \x 
and a, respectively. Then under (SI) we have 

P/UO € D n > P{fT0 € Qn}P{^0 G 2?nko € Qn} 

> P{° G Qn}P{/Uo £ Pn.au ko G Qn} 



Hence, 



liminf inf P{f n G T> n \ > (1 — a) 2 = (1 — a). 

Jl^OO n"Gi3™ ' V ' 



Under (S2), 

PK £ Pn} = P{/^o i V n ,a i Q n } + P{^ ^ P n ,a G Q„} 
< P{^o i Qn} + £ T> n>ao ,a G Q n } 
<P{<T0 ? Qn} + P{Vo ?Vn,a }- 

Thus, 

liminf inf P{f n G C n \ > (1 — a — S) = 1 — a. 

rwoo n"ei3™ 

This completes the proof. 

For the final claim, note that the uniform consistency of a and the asymp- 
totic constancy of B n (Lemma 7.1) imply that B{u) = B(l) + op(l), uni- 
formly in q < u < 1 and [i G B. The theorem follows from Theorems 3.1, 3.2 
and 3.3 and 4.3. □ 



Proof of Theorem 5.2. For any / G .F„ iC , we have that 
(76) \T(f) - r(/*)| < \T(f) - T(f n )\ + \T(f n ) - T(/*)|. 
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Since J^ipjk — whenever the support of tpjk is contained in [a, b], the first 
term is bounded by (with C' denoting a possibly different constant in each 
expression) 



oo v-\ 



|T(/)-T(/ B )|< ]T E \^k\T— / ^k{x)dx 

T , 1 I r, & J a 



i=Ji+i fc=o 

-y/ oo 



j=-h+l 



pi oo 

<7T E WW 



< 



G 
A 



r/ OO 



E 2" 



a 

nA r 



t-Ji 



= o(n C-1 /( lo g™) LCJ )- 

For a given < a < & < f , let g = sup{f < m < n: (m — l)/n < a} and 
r = inf{f < to < n:b<m/n}. The second term in (76) is bounded by 



\T(f n )-T(ti)\<— Elwl 



1 w /■!> rr/n 

= 7 — El^i fa- 

O-af^ J a J(q-l)n 



< 



b — a 



EN 



'(<?-!)/« 



+ 



rr/n 



<7^[ 2 EV fc |- + ^ Emax|/3,,|2^ 



L k=0 



< 



1 



2 J 0-l r . r Ji 



n n . 

«=0 J=Jo 



< 



b — a 
6 — a 



E \«h\^+^(Ji-j ) 



L fc=o 
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1 



^ Co , C logra 



fc =o n n 



< 



b — a 

1 (?"(! + log n) 

= o(n c " 1 /( 1 °g«) LCJ )- 

It follows that TniTrq^^n) = o(n^ _1 /(log n) I^J ) . The result follows by The- 
orem 5.1. □ 

8. Discussion. The expected radius of the confidence ball can be shown 
to be of order re -1 / 4 . This is not surprising since the minimax estimation 
rate for a Besov space is n _7// ( 27+1 ), which approaches n _1//4 as 7 approaches 
1/2. Moreover, Li (1989) showed that for nonparametric regression with- 
out smoothness constraints, confidence spheres for nonparametric regression 
cannot shrink faster than re -1 / 4 . Indeed, the presence of the term t /yfn in 
the squared radius of our confidence balls implies that rate cannot be faster 
than n -1 / 4 . This is consistent with the results in Low (1997) and Cai and 
Low (2003) that suggest confidence sets cannot be rate adaptive. Thus, while 
we have not shown that our confidence set T> n is rate optimal, we doubt that 
the rate can be improved. One consequence of the slow rate of the confidence 
set is that the arguments that favor threshold estimators over modulators 
no longer apply. 

We have chosen to emphasize confidence balls and simultaneous confi- 
dence sets for functionals. A more traditional approach is to construct an 
interval of the form f(x)±w n , where f(x) is an estimate of f(x) and w n is an 
appropriate sequence of constants. This corresponds to taking T( f ) = f(x), 
the evaluation functional, in Theorem 5.1. There is a rich literature on this 
subject; a recent example in the wavelet framework is Picard and Tribouley 
(2000). Such confidence intervals are pointwise in two senses. First, they 
focus on the regression function at a particular point x, although they can 
be extended into a confidence band. Second, the validity of the asymptotic 
coverage usually only holds for a fixed function /: the absolute difference 
between the coverage probability and the target 1 — a converges to zero for 
each fixed function, but the supremum of this difference over the function 
space need not converge. Moreover, in this approach one must estimate the 
asymptotic bias of the function estimator or eliminate the bias by under- 
smoothing. While acknowledging that this approach has some appeal and is 
certainly of great value in some cases, we prefer the confidence ball approach 
for several reasons. First, it avoids having to estimate and correct for the 
bias which is often difficult to do in practice and usually entails putting extra 
assumptions on the functions. Second, it produces confidence sets that are 
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asymptotically uniform over large classes. Third, it leads directly to con- 
fidence sets for classes of functionals which we believe are quite useful in 
scientific applications. Of course, we could take the class of functionals T to 
be the set of evaluation functions f(x) and so our approach does produce 
confidence bands too. It is easy to see, however, that without additional 
assumptions on the functions, these bands are hopelessly wide. We should 
also mention that another approach is to construct Bayesian posterior inter- 
vals as in Barber, Nason and Silverman (2002), for example. However, the 
frequentist coverage of such sets is unknown. 

In Section 5 we gave a flavor of how information can be extracted from 
the confidence ball C n using functionals. Beran (2000) discusses a differ- 
ent approach to exploring C n which he calls "probing the confidence set." 
This involves plotting smooth and wiggly representatives from C n . A gener- 
alization of these ideas is to use families of what we call parametric probes. 
These are parameterized functionals tailored to look for specific features of 
the function such as jumps and bumps. In a future paper we will report 
on probes, as well as other practical issues that arise. In particular, we will 
report on confidence sets for other shrinkage schemes besides thresholding 
and linear modulators. 

Acknowledgments. The authors thank the referees for helpful comments. 
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